.ce
                             The SE Editor

.fi
.rm 76

This file announces the release of yet another of the derivatives of the 'e'
editor.  The highlights of this version are:

1) Utilizes up to 500 kb of RAM for text storage, while functioning with as
little a 6kb of allocatable memory. 2) The efficiency of the virtual disc
system has been doubled by adding a stale page directory.  The speed of disc
reads has been improved. 3) An embedded runoff function will reformat
internal text per dot commands. 4) A text push stack has been added for
pushing and popping lines. 5) The undo capability has been extended to
include redo. 6) The program supports free cursor movement. 7) Numerous
enhancements to the command and display structure have been made, while
retaining Wordstar [1] compatibility where feasible and reasonable.  Many of
the enhancements are cosmetic, not affecting the command structure, but
improving the visibility or convenience of use.

The starting point for this system was the GED version, CUG 199.  The
contributors to that version were G.  Nigel Gilbert, James W.  Haefner, and
Mel Tearle.  Several small errors have been corrected.  The new version
compiles under Microsoft 4.00.  This compiler allows selective use of far
pointers, opening up the possibilities of the large memory model without its
usual inefficiency.  Some effort has been made to otherwise maintain
compatibility with the DeSmet starting point, but the thoroughness of the
compatibility has not been tested.

The architecture remains that of the earlier versions.  It is a good
architecture, providing a solid foundation for further enhancements and
additions.  Due to the ancestry of the program, the architecture is oriented
to the needs of a slow remote terminal.  In the interest of portability,
that design philosophy has been retained, even though some shortcuts could
have been taken with the direct access to the video RAM in the PC's.  The
shortcuts would compromise the portability of the program.

There are preprocessor directives in the header file which can be changed to
remove all occurrences of the near and far keywords, making the code
compatible with compilers which do not support mixed memory models.  Most
users will not find the capabilities of the program running under the small
memory model overly restrictive.  The virtual memory system is effective in
masking the shortcomings of a small RAM.

A tidying up operation has been performed, including the addition of many
comments.  The new code, now in its third compiler port, should be mostly
vanilla.  An attempt has been made to further segregate those functions
which are system dependent (IBM PC, MSDOS) to simplify rework for a port to
a different system.  The strategy in porting the system to an entirely
different environment is to replace the system interface routines
altogether.

The program is configured for editing a 2.4 megabyte english dictionary
consisting of 10000 lines of 253 characters each.  The maximum file size
will be less for 80-character lines.  The maximum number of lines in a file
is 16383.

The line limit may be less for small memory systems, but remains adequate
for most work in any plausible system. 1000 line files can be edited with
only 8 kb of allocatable RAM.  A line pointer array is kept in RAM, so all
line jumps are instantaneous if the text is in RAM, or just the disc access
time (without searching) if all the document won't fit in RAM.  The virtual
memory system keeps the most recently used pages in RAM, as they are the
most likely to be used or viewed again.  Global string searches have little
effect on the established page priorities.  They normally search all pages
and are given special consideration so that they do not upset the virtual
memory priorities established by earlier editing and display functions.

Earlier versions unconditionally wrote a page back to disc if the RAM slot
was needed for another use.  This version does not do a disc write if the
virtual memory page is unchanged since being rolled in from disc.  In a
further refinement, the line pointer array and text pages are allocated
beginning from opposite ends of memory, so that they do not collide until
RAM is exhausted.  Collision was immediate in the earlier versions,
resulting in unnecessary disc thrashing during the initial read.  If enough
RAM is available the new version will not create a temporary disc file at
all.

Tree directories are supported.  An error in earlier versions which made
filespecs of the form ..\filespec unusable has been corrected.  The
temporary disc file normally goes in the default directory of the default
drive.  The -D invocation option can be used to place the temporary file on
any drive.  In browsing through the files on floppy discs from a hard disc
system, nothing is written to the floppy disc unless directed there from the
keyboard.  Archived files can be studied with little risk of accidental
modification.

The program updates the screen before the data base, making it seem faster
than it is in some cases.  In the case of line deletions or insertions, all
the line pointer array beyond the modified point is moved.  The processing
delay is seen if a second key is entered during the move.  The delay becomes
noticeable and objectionable at about 5000 lines on a 5 MHz PC.  The delay
is not objectionable (and rarely seen) with 16000 line documents on an 80286
or faster system.

On a 5 MHz PC the primary file is read at the rate of 5200 characters per
second from fixed disc; at 3000 characters per second from floppies.  The
string search operation proceeds at 30,000 characters per second if the
material is in RAM.  The search rate exceeds 200,000 characters per second
on more recent systems.

Learning to use any word processor effectively represents a significant
investment in time.  For that reason I have tried to adopt or retain the
Wordstar keyboard layout for the frequent and habitual keystrokes.  But in
the more complex and less frequent functions, and when there is is ample
visual prompting or feedback, significant deviations have been made.

Some editing programs trap the cursor within the portion of the screen
containing text for reasons which are surely without merit.  The package has
been modified to allow free cursor movement.  The horizontal domain of the
cursor lies between columns 1 and 255.  The ^D, right arrow, and ^] (end of
line) cursor positioning commands will move the cursor past the right end of
the line.  Doing so creates temporary spaces at the end of the line, but
they are removed before the line is stored.  Editing is performed as though
each line had spaces all the way to the right.  Free cursor movement is a
great convenience in editing C code because the cursor will stay at a fixed
indentation level when moved vertically.

The -S option restores the earlier mode.  In this mode the cursor is trapped
within those regions already containing text (or trailing spaces).  In this
mode it is unnecessary to strip trailing blanks before lines are stored, and
they are not.  That is what the mode used for.  But otherwise, the first
time a file passes through SE without the -S option it may shrink in size
without any editing activity.  The thing lost is trailing whitespace.

The GED version had a full undo capability, but I quickly discovered that
after undoing more than two lines I had always forgotten what I had changed
and why.  The changes I saw occurring on the screen in time reversal didn't
make any sense.  To overcome that problem I modified the undo algorithm to
be reversible and with that there is redo.  The same algorithm and the same
code is used for both.  The complete algorithm is surprisingly compact.  By
moving back and forth along the edit trail it is usually possible to
recognize the correct restoration point many lines back.  Undo is nice, but
it is redo that makes it work.

As each line is undone or redone it appears in reverse field on the screen.
The cursor follows the undone and redone lines about the document.  So long
as the undo capacity of the program is not exceeded (it runs 50 to 100
lines, depending on the activity), the restored program is guaranteed to be
identical with the original.  There are no restrictions on the complexity of
the undo steps, and no restrictions on changing changed lines.

When the undo mode is entered with ^- (ctrl-minus), the program
automatically locks out all editing commands.  That is necessary because as
soon as any change is made the redo capability from that point forward is
lost.  When in the undo/redo mode, which is prompted, the + key (not
ctrl-plus) becomes the redo key.  The lockout makes it safe to browse
backward and forward on the edit trail.  The global search and replace
operations can be undone.  The insertion of disc file with ^KR can also be
undone.  All editing operations can be undone.

Although the stack operations have some restrictions, they are nevertheless
a useful and heavily used operation.  They provide an easy way of moving a
line of code, of transposing lines, and of duplicating lines.  Pushing
several lines then popping them elsewhere is a convenient way to move a
small block.

The text stack shares a data base with the undo function, so the undo and
pop operations have some conflict.  Pops can be undone, but if the editing
operation which did the push is undone then the associated pop will find the
stack empty.  The ^O and ^P stack operations pop the lines pushed with ^Y.
^O pops a copy of the line; ^P is the conventional pop.  The cursor must be
in column 1 for a ^Y push to occur.  The stack will hold 100 lines.

A few editing operations other than ^Y (such as line concatenation) affect
the stack also and can result in unexpected items on the stack.  Don't do
too much editing if items are to be popped from the stack.

The ^J linejump command has several prompted options.  It will jump to the
line last changed.  That is a good way to return to the point of editing
after browsing elsewhere.  It is also at times a good way of jogging the
memory, even to the extent of determining if anything at all has been
changed.

Up to three lines can be marked with the ^JS (set mark) option.  A
subsequent ^JM (jump to mark) will return to that point.  If the cursor is
already on a marked line, then a second ^JM moves it to the line marked
before that one.  If a fourth line is marked then the oldest marked line
becomes unmarked.  If the cursor is on the oldest marked line then it moves
to the newest marked line.  A ^JJ will jump to the last jump.  The most
common form is ^Jn, where n is a line number.  All jumps are instantaneous,
regardless of the document size.  No searching is involved.  The marked
lines are not modified; it is the line number which is stored.  The list of
line numbers is automatically adjusted if lines are inserted or deleted
before that point.

The other means of maneuvering about a large document is with the string
search.  With a 30,000 to 200,000+ character per second search rate the
technique becomes more attractive than when using the slower commercial
packages.  The F3 and F4 keys can be used to rock back and forth between two
widely separated occurrences of a word, making intercomparison of the two
areas easy.  Search keys are remembered until changed, regardless of other
activities.  The Wordstar ^QF,^QA, and ^L commands have also been retained
for compatibility.

All search and search/replace operations wrap at beginning and end of file.
In a forward search the first possible match begins at the character
immediately to the right of the cursor.  The last possible match is at the
initial cursor position, but only after the search as proceeded to the last
line, restarted at the first line, and finally returning to the initial
position.  Complementary rules apply to backward searches.

For moving a few pages from the current line the easiest way is with the ^R
and ^C keys, which are the same as the PgUp and PgDn dedicated keys.  With
dense text the screen refresh time is 350 milliseconds on a 5 MHz PC.  This
time is near the minimum achievable with the hardware.  The program avoids a
complete screen rewrite whenever possible by using the faster scroll
function.

The most common use of the F6 "center window" command is when a few of the
displayed lines extend offscreen to the right.  If the cursor is already
near the right of the screen, and if the line ends are not too far offscreen
to the right, it will move the window right just far enough to view the last
portion of the lines.

A -p option has been added to allow the importation of Wordstar document
mode files.  This option is probably useful with word processors from some
other manufacturers also.  The only function performed by the -p option is
to zero the high order bit of each ASCII character in the primary file as it
is read.  Some editing will be required after the conversion, but the input
is at least legible.  SE does not create any hidden control characters for
the disc output.  The only format control function is the dot commands,
which are edited like any other text.

SE will read and display something for any possible disc file format, even
binary files.  The binary capability has no value, but the text in the files
from any other word processor can at least be seen.  The read operation is
terminated by an end of file character, however, and some packages do
include binary or graphic data, which might contain an internal end of file
byte.

SE is capable of producing files with ASCII codes in the range 0x80 to 0xFF.
These codes, which include the graphic symbols, are entered by holding down
the Alt key and entering the 3-digit decimal code on the numeric keypad (The
translation is performed by BIOS, and works for all programs).  A method is
also provided of entering codes 0 through 0x1F, although the use of the
control codes in text is not encouraged.

Input files are automatically detabbed in this version.  The tab key is a
cursor positioning command.  Its use does not alter the text.  Earlier
versions of the program had the capability editing files with tabs, and of
retaining the tabs.  The only advantage to that is that it saves disc space,
while complicating the manual interaction.  A capability of entabbing the
output file at the time it is written would be a useful enhancement, because
it is a simple form of file compression for sparse data.

A consideration in the command changes has been that there should be only
one program mode.  The problem with multiple modes is that, assuming
independence, there are 2^n states to remember.  I am convinced by
experience that I cannot handle more than n=0.  The natural language editing
is performed in the same mode as the editing of computer programs.

Now it is true that the keyboard designers had in mind that the character
replace mode should be the primary mode, so they provided an insert key for
inserting characters.  That seems to make sense, but it really doesn't.  The
prime mode is the insert mode, and the insert key is used to leave it.  Most
will prefer it that way, but the rules are easily inverted if the other way
is more familiar.

An attempt has been made to structure the error messages and prompts so that
a new user will eventually be led into the nooks and crannies of the
features, while at the same time not cluttering up the screen with all the
possibilities at the top level.  The F1 help key shows the basic commands.
A different help display is shown in the paragraphing mode.

The backspace key is a destructive backspace.  The left arrow and the ^S
keys are for moving the cursor left.  The backspace key is for correcting
typing mistakes.  The usage is not the same as that of Wordstar, but the
destructive backspace is nearly a universal standard elsewhere.

The method chosen for paragraph reforming is familiar yet unusual.  A runoff
program has been embedded in the program.  Its functions are conventional
except that it operates directly on the memory image and, unless otherwise
requested, leaves the dot commands in.  Thus it is the master file which is
reformatted, after which editing of it can resume.  This technique provides
a flexibility and power not readily achievable by other means.  It also has
the advantage of not embedding hidden control characters in the text.  Both
the command and the effect of the command can be seen at the same time.

The runoff function is used in more than one way.  First of all, if the
master file is being reformed then the dot commands should be restricted to
those which are reversible by automated means.  Those which are reversible
are flagged in the help display for the ^QP reform function.  A function
such as changing the right margin is reversible by simply changing the dot
command argument back to its original value and reforming.  A function such
as creating a header line with page number is not automatically reversible,
because the command creates new lines which must be removed manually if they
are not wanted.

So if the objective is to actually produce a hard copy, the full set of dot
commands are used and the result written to a temporary print file with the
^KW command.  The F9 key is used to exit the program without changing the
master file.  The same method is useful for seeing what the final will look
like on paper, but without the need of actually making a hard copy.  The
screen will become a direct image of the result.  The restricted set of dot
commands reversibly tidy up the material for further editing.  They make the
material easier to read, and easier to enter because the paragraphs can be
ragged.  Dot command defaults are provided, so that simple paragraph
reforming can be accomplished without the use of any dot commands.

The program does not distinguish between document files and non-document
files.  Document files will usually contain dot commands, but that is not
required.  It also makes no difference whether the right end of a line is
terminated by a carriage return or by some other cursor positioning command.
The program does not use carriage returns or line feeds internally.  All
strings are internally null terminated, and no memory is retained of the
method of line termination.  If the line looks terminated on the screen then
that is the way that it is.



.ce
                                Future work

No spelling correction capability is bundled with the package, but that
probably would not be a good idea anyway, as it would needlessly complicate
the code maintenance problem.  See CUG 217 and 218 for spelling correction.
Hyphenation is closely related to spelling correction, and is also not
bundled with the editing function.

It is not clear at this point that it would be a good idea to build the
quirks of the individual printer manufacturers into a general purpose
editing program.  For printers of the complexity of the laser printers a
fully developed runoff program is a major program in its own right.  The
same is true of phototypesetting.  The output of SE is nevertheless usable
for these purposes with a suitably rich set of dot commands and a
post-processor.  To that end, SE ignores dot commands that it does not
recognize.  Thus with a capability of true proportional spacing the screen
image would not be quite the same as the printer output, but it would
nevertheless be close enough to prevent most command and editing blunders.

Another feature which would be nice at some time in the future is a
configuration disc file.  This file should be an ordinary text file so than
an arcane configuration mode is not needed to change it.  It should contain
all the invocation options and the keyboard translation table.  The
possibility of finding any one keyboard layout to which everyone agrees is
remote, so the file should contain a key translation table.

There is a need for new natural language commands.  A means of pushing and
popping whole sentences without regard for line boundaries would be useful,
for example.  There are not many convenient and portable keys left for the
natural language functions.  A mode change which is good for one command
only is one possible way of extending the command structure.  A separate
program mode would not be a good way, because computer programs contain
natural language comments.  I have found on other systems that the escape
commands with one or two letter command name are not so inconvenient as it
might seem.  Remember that a control key counts as perhaps 1.5 keystrokes
rather than 1.  That is to be compared to the 2 or 3 keystrokes for an
escape command.  The visible feedback provided by the command line also
gives it an advantage for functions where the visibility is relevant.  The
same feedback would be distracting when it is not relevant.  The ALT and ESC
keys are available for the natural language extensions.

Another extension needed by the program is a true macro capability.  The
stream editors from the Unix environment illustrate the power of the
technique.  The utility of push stacks is well established as a method of
implementing automatic procedures, suggesting that the text stack capability
of the program be extended to include macros.  The first step would then be
to expand the stack capability to include items of type word, type line, and
type column.  The type word and type column items would displace
horizontally when popped; the type line would displace vertically.  Any item
which can be deleted as an entity is a candidate for a stack item.

The next essential element is a cursor trapping capability when in the macro
mode.  The popped item would appear in reverses field with the cursor
somewhere within it.  The macro-driven cursor positioning commands, mostly
in the form of generalized string searches, would be limited to that region.
A macro step would fail when the string is not found, causing the alternate
macro path to be taken.  Two stacks are probably needed, with a simple way
of toggling between them.  Problems as complex as alphabetizing a list and
conditional block replacement based on elaborate rules can be accomplished
with this technique.

In order that the macros be alterable in the same way as any text,
equivalents of the control character commands would have to be provided in
readable form so that the macro is composed of visible characters which can
be edited without complication.




.ce
                              Getting started

The file keys.doc contains the operating instructions.  The program compiles
under the Microsoft 4.00 compiler.  The batch proceedures use a \obj
subdirectory to avoid cluttering the main directory.  The build.bat
procedure will reconstruct all the object files (not included).  The one
assembly language file is included in both source and object form for those
without an assmebler.  The link edited and executable result is in se.exe
(included).  Edit a program with the minimum command line SE FILESPEC.

The program has been used on systems ranging from a 5 Mhz PC clone to a 25
MHz 80386.  It should run on most any system supporting DOS 2.0 or later.
You will want to reconfigure the video parameters in ged.h for a color
monitor, but the program is usable as-is on almost any system.  The timed
delays are independent of CPU speed.  The program does make direct access to
the video memory, but that usually does not cause compatibility problems.
The 80386 may virtualize these direct accesses on the more advanced systems,
but it still works.



.nf
The main files are:

ged.h     header file
se.c      top level
edit.c    basic editing operations
search.c  string search
roff.c    paragraphing
block.c   block operations
ged1.c    disc directory, options
hist.c    undo, push/pop
paint.c   screen output
ged10.c   disc library functions
ged5.c    open and close files
ged7.c    low level terminal i/o, help display
store.c   store lines
swap.c    virtual memory
term.c    (terminal) low level screen and keyboard interface



Gary Osborn
Electro Chemical Devices, inc
23665 Via Del Rio
Yorba Linda CA 92686

June 1990



[1] WordStar is a trademark of Wordstar International, 33 San Pablo Ave, San
Rafael, CA 94903.


